**NARAYANAN**

**Implementation of Adaptive Data Prefetching in L2 Caches**

January 2014 – May 2014

Designed a Stride Filtered Markov prefetcher for L2 data cache on an x86 based CPU using gem5 full**system** simulator and carried out performance and power analysis.  
Achieved up to 15% throughput improvement over no prefetching and 6% over the existing scheme across a set of compute and pointer intensive SPEC CPU 2006, PARSEC, SPLASH-2 and Rodinia benchmarks.

**Implementation of MIPS Processor (16 bit) using Altera – Quartus II**

September 2013 – December 2013

Implemented MIPS arithmetic, memory and control instructions on a 5 stage-pipelined RISC processor**system** with forwarding, register bypassing features.   
Designed a direct mapped write-back data cache using verilog HDL.   
Validated the overall functionality of processor using CAD tool – Quartus II.

**Design of Programmable Infinite Impulse Response Filters**

September 2013 – December 2013

Designed a special-purpose digital signal processor using verilog HDL and CAD tools - ModelSim, Synopsys Design Vision to implement PIIR filter operation.  
Simulated and synthesized various designs using different combinations of adders and multipliers to optimize the performance of design in terms of area and throughput.

**Operating System Libraries**

January 2014 – May 2014

Developed a command line interpreter to implement redirection, pipeline, and built-in UNIX commands.  
Implemented a memory allocator for the heap of a user-level process to perform the functions of malloc() and free().  
Wrote a multi-threaded code with spin-lock and compared it against pthread lock.  
Built a multithreaded web server based on HTTP with different scheduling policies such as FIFO, Smallest File First (SFF) and Smallest File First with Bounded Starvation (SFF-BS).  
Developed a UDP-based file server and client library to support file handling.  
  
[**less**](https://www.linkedin.com/profile/view?id=164191803&authType=OUT_OF_NETWORK&authToken=Bn00&locale=en_US&srchid=522862671410445063329&srchindex=2&srchtotal=5822&trk=vsrp_people_res_name&trkInfo=VSRPsearchId%3A522862671410445063329%2CVSRPtargetId%3A164191803%2CVSRPcmpt%3Aprimary)

**Solving Linear Systems on GPU using CUDA**

September 2013 – December 2013

Wrote a parallel CUDA code to solve large linear systems on NVIDIA GTX480, K20X devices and verified the results with Intel MKL Banded Solver.  
Implementation was based on conjugate gradient method using device functions for multiplication, SAXPY, dot product and host function for scalar operations.  
Analyzed the algorithm using Profiling and Scaling Analysis on NVIDIA Visual Profiler to achieve...[**more**](https://www.linkedin.com/profile/view?id=164191803&authType=OUT_OF_NETWORK&authToken=Bn00&locale=en_US&srchid=522862671410445063329&srchindex=2&srchtotal=5822&trk=vsrp_people_res_name&trkInfo=VSRPsearchId%3A522862671410445063329%2CVSRPtargetId%3A164191803%2CVSRPcmpt%3Aprimary)

**Design of Intelligent track circuit for fail safe operation of Railway Control Systems**

October 2012 – May 2013

Approved and Funded by Institute of Electrical and Electronics Engineers (IEEE).  
Track Circuit, RF interrupts detection and serial communication using TI - MSP430 was used to get information about train occupancy.  
The status is communicated to the control room using the Railway Optical Fiber Communication Network with reference to the existing architecture in Indian Railways.

**Design of Crossbar Switch using Xilinx LUT and FIFO logic**

January 2012 – May 2012

Designed a m\*n matrix switch in Xilinx using Verilog HDL.   
The m input lines coupled with n selector lines were used to produce a number of fusible links. The circuit was developed using LUT and FIFO logic enabling dynamic interconnection networks..

PRAVEEN

#### Characterizing the Load/Store Architecture of AMD GPUs using Microbenchmarking

Designed and developed microbenchmarks in the OpenCL programming language to characterize the load/store architecture of AMD Radeon 7950 Tahiti GPU.  
Determined the structure of the memory hierarchy to obtain parameters like cache capacity, block size, associativity and cache access latency.  
Performed dependent memory accesses through pointer-chasing to analyze the wavefront scheduling and...[**more**](https://www.linkedin.com/profile/view?id=140003200&authType=OUT_OF_NETWORK&authToken=znHU&locale=en_US&srchid=522862671410445063329&srchindex=1&srchtotal=5822&trk=vsrp_people_res_name&trkInfo=VSRPsearchId%3A522862671410445063329%2CVSRPtargetId%3A140003200%2CVSRPcmpt%3Aprimary)

#### Design of 16-bit Pipelined RISC Processor

Implemented a 16-bit data path and control path based on the MIPS Instruction Set Architecture.  
Developed on this to implement a pipelined processor with data forwarding and hazard detection  
units and verified its functionality.  
Implemented a branch prediction scheme to minimize control hazards.

#### ****System**** Software Projects

Developed a shell that interprets the commands of the Unix command line, with support for pipeline, redirection and background processes.  
Implemented a memory allocator library that mimics the function of malloc() and free().  
Developed a multi-threaded web server based on the HTTP protocol, with three different scheduling policies - FIFO, Smallest File First (SFF) and SFF with Bounded Starvation...[**more**](https://www.linkedin.com/profile/view?id=140003200&authType=OUT_OF_NETWORK&authToken=znHU&locale=en_US&srchid=522862671410445063329&srchindex=1&srchtotal=5822&trk=vsrp_people_res_name&trkInfo=VSRPsearchId%3A522862671410445063329%2CVSRPtargetId%3A140003200%2CVSRPcmpt%3Aprimary)

#### Parallelisation of RNA Folding Algorithm

Implemented a serial version of the RNA Folding algorithm to predict optimal secondary structures using dynamic programming and parallelized it using OpenMP and NVIDIA's CUDA C.  
Designed two different CUDA kernels and compared their performance. Obtained a speed up of  
18x for an input sequence of length 8,000 and 40x for a sequence of length 16,000.

#### An API to Profile Code

Developed a C++ API that helps profile code by reducing programmer effort required to compute the data of interest.  
Provided methods to compute execution time, flop rate, bandwidth, analyze nested timers and print elegant reports.  
Implemented an ability to compare performance changes between multiple runs of the program.

#### Programmable Infinite Impulse Response Filter

Designed and implemented the Finite State Machine of an IIR Filter with programmable operation modes using Verilog.  
Realised the Multiplier-Accumulator unit using Kogge stone adder and Wallace tree multiplier  
and verified the filter's functionality using automated testbenches.  
Synthesized the design in Design Vision and performed area and timing analysis.

#### Currency Identifier for the blind

Designed an Arduino-based Indian currency identification **system** for the blind, which produces  
a voice output for each denomination, with an accuracy of more than 95%.

Naveen anand

**Pipeline Design of a Processor**

August 2013 – November 2013

Implemented a 16-bit pipelined RISC processor based on the MIPS Instruction Architecture.  
Implemented Data Forwardsing and Hazard Detection unit to handle data dependencies and veried its functionality.

**Characterizing the Load/Store Queue of AMD GPU using benchmarks (Ongoing)**

March 2014

We intend to characterize the load/store architecture of AMD Radeon HD7970 GPU through microbenchmarks written in OpenCL. The results obtained from these benchmarks help to characterize the features of the machines. Our goal is to write microbenchmarks that help characterize the load/store units properties, particularly coalescing and capacity of load/store queues.

**Programmable Infinite Impulse Response Filter using Verilog**

August 2013 – December 2013

Implemented an IIR Filter as a Finite State Machine using Verilog.  
Synthesized the design in Design Vision using the TSMC 40nm general purpose library and performed area, timing and power analysis.  
Realised the floating point Multiply -Accumulate unit using Kogge Stone adder and Booth Multiplier and veried them using automated testbenches.

**Operating Systems Libraries and Tools**

February 2014 – May 2014

Implemented malloc(), free() with worst-fit, best-fit and first-fit algorithms and compared  
the performance on different workloads.   
Modified the data structures used to improve storage performance while not adversely affecting allocation efficiency.  
Developed a shell that implemented the features of the UNIX shell and included add-on features to enable redirection, batch scripting and...[**more**](https://www.linkedin.com/profile/view?id=309719095&authType=name&authToken=2O1W&trk=prof-sb-browse_map-name)

**Video Encryption using GPU**

November 2013 – December 2013

Used a Fermi GPGPU to parallelize the encryption of dierent video frames  
simultaneously.  
Used OpenCV library and performed encryption using multi-key interleaving.  
Compared the performance improvements over the serial implementation using CUDA and OpenMP and obtained a 10x Speedup.

Afrin

**Design of ALU in a RISC Microprocessor**

September 2013 – December 2013

- Designed the Schematic using Cadence.  
- Simulated the Pre-layout design using Spectre/Analog Artist.  
- Designed the layout with Cadence Virtuoso (LVS/DRC clean).  
- Parasitic extraction and Resimulation/Characterization with extracted parasitics.

**Efficient implementation of Programmable Infinite Impulse Response filter in Verilog HDL**

September 2013 – December 2013

- Designed and Implemented PIIR filter with sub-modules such as a 16-Bit fixed point adder unit, 16-Bit fixed point multiplier unit, Delay unit.  
- Synthesized the entire design using Synopsis Design Vision tool.  
- Optimized the design to reduce area and improve performance.

**TCMS based FinFET buffer design for power efficient interconnects**

August 2013 – December 2013

- Implemented Fast Dynamic Buffer Insertion Scheme (Fast DBIS) algorithm using C++.  
- FinFET buffers based on the TCMS (Threshold voltage control through multiple supply voltages) principle were used for buffer insertion in the GSRC benchmarks to obtain power efficient interconnects.

**Implementation of Alarm Annunciators using FPGA**

January 2012 – May 2012

- Designed an alarm annunciator using a FPGA with the help of National Instruments CompactRIO  
- The alarm annunciator implemented the International Society of Automation (ISA) sequences and was tested successfully on a real time three tank setup

Swetha

##### [**Indian Institute of Technology, Madras**](https://www.linkedin.com/company/157267?trk=prof-exp-company-name)

May 2012 – June 2012 (2 months)

Internship at Indian Institute of Technology, Madras under Dr. Anil Prabhakar (May-July 2012)   
Cross compilation and Interfacing of Embedded Processor to Control Laser Drivers (ARM 9 and 11) :  
• Worked with the Board Support Package of ARM 9 and ARM 11  
• BSP of different Operating systems (Linux, Windows, Android) worked with GNU tools, gdb, g++, gcc.  
• Linux shell scripting, cross- compilation, worked with device drivers  
• Serial port interfacing, embedded C++programming, developing a GUI using C++, Android and Qtopia APIs to control laser drivers from the embedded processor

#### Implementation and Evaluation of Sandbox Prefetcher in gem5 simulator

January 2014 – May 2014

Implemented the Sandbox Prefetcher for L2 misses in gem5 architectural simulator using C++ and Python. Different configurations of the prefetcher were implemented and compared with the Stride prefetcher using Simpoints. Full system and system emulation modes where executed.

#### Implementation of Programmable IIR Filter and Synthesis using Design Vision, Fall 2013 University of Wisconsin-Madison, Madison, WI

August 2013 – November 2013

Programmable IIR filter was designed using Verilog HDL with the necessary control logic and interfaces to enable its operation at an optimum frequency. The objective was to develop code that would synthesize to hardware with high throughput and low area which included analysis of design vision synthesis techniques and robust coding.

3 team members

#### Power and Reliability-Aware Scheduling of Real-Time Periodic Tasks

January 2014 – May 2014

Developed a discrete event simulator in C++ to schedule tasks according to Earliest Deadline First (EDF) algorithm. Incorporated several static and dynamic power saving algorithms preserving reliability. Results for different utilizations and worst-case to best-case computation ratio were obtained and analyzed.

#### Parallel programming on the GPU

August 2013 – December 2013

CUDA software development for solving a dense banded system using shared memory, parallel programming and performance modelling. Include implementation of multiple kernel calls, synchronization barriers and scaling analysis for different algorithms.

#### Design of MIPS pipelined processor

Design in Quartus including pipelining, branch prediction and cache implementation. Included design of ALU, forwarding, register bypassing, stack operations and control unit.

#### Self-Sensing Control of Robotic Gripper

January 2013 – April 2013

Development of Control Algorithms for a Shape Memory Alloy actuated robotic gripper including PI, Sliding Mode Control(SMC) and Adaptive SMC, computation and comparison of performance indices for them and development of self-sensing control technique for the different control algorithms.

#### Product Design

July 2012 – November 2012

Real time implementation using a microcontroller (8051), Infrared and temperature sensors to operate a fan on manual and automatic modes. Involved programming the microcontroller and interfacing. Data is received by the Microcontroller which sends the output to the motor driver and a display to show the fan speed.

Gayatri vishwa

#### Full Custom Design of Control Path and ALU Bypass Loop of a ARM RISC microprocessor (ECE 555)

August 2013 – December 2013

Implementation of Control Path and a simplified critical path of the ALU Bypass Loop of ARM RISC processor, using Standard Cells designed at 180nm. Completed all phases of design- schematic, pre-layout simulation, layout and parasitic extraction & re-simulation. The design was optimized for performance, power and area.   
Tools: Cadence (Schematic), Cadence Virtuoso (Layout), Spectre/Analog...[**more**](https://www.linkedin.com/profile/view?id=120262150&authType=name&authToken=SpxI&trk=prof-proj-cc-name)

#### Implementation of Programmable IIR Filter and Synthesis using Design Vision (ECE 551)

August 2013 – November 2013

To efficiently design and synthesize a special-purpose DSP for a PIIR Filter with a 16-bit fixed point adder and multiplier by optimizing area and performance. Designed a Booth-Encoded Wallace multiplier and Kogge-Stone adder and synthesized to obtain best possible area and timing results.   
Tools: Verilog Programming using ModelSim, Synthesis using DesignVision

#### Survey of Low-Power Cache Designs (ECE 752)

August 2013 – December 2013

A comprehensive survey of Low Power Cache Design Techniques covering Circuit-Level Techniques (State-Preserving and Non-State Preserving), Micro-architectural techniques, Compiler and OS-based techniques.

#### A comparison of Deterministic and Stochastic Logic for Image Processing Applications (ECE 753)

January 2014 – May 2014

A comparison of deterministic and stochastic level implementations for commonly used image processing applications such as gamma correction and edge detection in terms of area, delay, power and fault tolerance.   
Proposed a stochastic logic architecture for mean filtering (smoothing applications).  
Tools: Verilog Programming using ModelSim (for design and fault-injection), Synthesis using...[**more**](https://www.linkedin.com/profile/view?id=120262150&authType=name&authToken=SpxI&trk=prof-proj-cc-name)

#### Implementation of A\* Router (ECE 556)

January 2014 – May 2014

Implemented an A\* router with Rip-up and Re-Route, Net Ordering and Pin Decomposition and optimizations for increased speed and reduced congestion.  
Tools: C++

#### Flexible M-Qam Modulator / Demodulator and Scalable FFT / IFFT for Cognitive Radio Systems

January 2013 – April 2013

Implemented a cognitive radio transceiver in VerilogHDL that complies with WiFi and WiMax Standards. Achieved system reconfiguration – i.e. flexibility in M-QAM modulator block and scalability in IFFT block, through the use of parameterized radio modules common to both the standards and optimized the IFFT block to reduce the computations.

Rakesh rosha

#### Implementation of Router Framework for VLSI back end Automation, Spring 2014 University of Wisconsin-Madison, Madison, WI

January 2014 – May 2014

• A routing framework was developed using C++ to attain minimum overflow and minimum time of execution by using pin decomposition, net-ordering and the A\* routing algorithm thus providing a good analysis of algorithm complexity and heuristics and Rip up and reroute techniques .

#### Implementation of Custom Cell and Data Path Design of RISC Microprocessor, Fall 2013 University of Wisconsin-Madison, Madison, WI

August 2013 – December 2013

Implementation of Schematic Design with full custom layout using Capture Cadence, Pre-layout Simulation using Spectre Analog Artist, and Layout using Cadence Virtuoso(including Parasitic Extraction and characterization with extracted parasitic) of Critical Path of ALU bypass loop of a typical ARM processor.

#### Implementation of Programmable IIR Filter and Synthesis using Design Vision, Fall 2013 University of Wisconsin-Madison, Madison, WI

August 2013 – November 2013

Programmable IIR filter was designed using Verilog HDL with the necessary control logic and interfaces to enable its operation at an optimum frequency. The objective was to develop code that would synthesize to hardware with high throughput and low area which included analysis of design vision synthesis techniques and robust coding.

#### Final Year thesis- Wireless ECG for detecting Heart Rate Variability

January 2013 – May 2013

• A robust and efficient system was designed with: analog front end, wireless transmission, FPGA back end for prototyping. Discrete wavelet transform was used for feature extraction and successful implementation of detecting heart rate variability

#### Speech Recording using Intel Atom E6XX Processor

November 2011 – January 2012

• I was successful in achieving the serial transfer of speech samples from an AVR microcontroller to the ATOM Processor. Thusly, by assembling the necessary hardware, my team was able to throw light on the ATOMs ability to be interfaced with various systems which was very instrumental in understanding its assets and liabilities.

#### Implementation of Speech Recognition on Cyclone II FPGA Board

May 2011 – July 2011

• I implemented the cochlear algorithm for speech recognition on the Cyclone II FPGA board. The algorithm was developed in Verilog HDL to increase the efficiency of training and recognition modes and also to develop and understand the psychoacoustic model

#### A comparison of Deterministic and Stochastic Logic for Image Processing Applications (ECE 753)

January 2014 – May 2014

A comparison of deterministic and stochastic level implementations for commonly used image processing applications such as gamma correction and edge detection in terms of area, delay, power and fault tolerance.   
Proposed a stochastic logic architecture for mean filtering (smoothing applications).  
Tools: Verilog Programming using ModelSim (for design and fault-injection), Synthesis using...[**more**](https://www.linkedin.com/profile/view?id=191205640&authType=name&authToken=hGeN&trk=prof-proj-cc-name)

Snehal

#### Embedded Stethoscope

August 2012 – April 2013

• Worked on a team of four students to design an electronic stethoscope to measure heart sounds.  
• Wrote the code for the C2000 controller using the CCS v5 software and designed and implemented filter and OP-AMP circuits using FilterPro and Proteus software.  
• Published two papers on this topic and presented one at an International Conference.

#### 16-bit single cycle data path and pipelined processor

November 2013

• Designed a 16-bit single cycle data path processor to implement 16 instructions using Altera Quartus.  
• 5 stage pipelined data path processor with forwarding unit to avoid data hazards and control hazards.  
• Direct mapped, write-back cache using Verilog code.

#### Digital signal processor

November 2013

• Designed a special-purpose DSP to efficiently implement a Programmable Infinite Impulse Response (PIIR) filters using Verilog.  
• Different designs for Multiply and Accumulate unit were implemented to optimize timing as well as area performance.

#### Test generation, diagnosis and partial scan

November 2013

• Developed a complete test set for given combinational circuit and fault coverage analysis using various ATPG tools.  
• Fault detection and diagnosis for the given virtual circuits.  
• Partial scan of sequential circuits using various methods like PODEM, FASTEST, etc.

#### Probabilistic model of performance parameters of Carbon Nanotube Field Effect Transistors (CNTFETs)

April 2014

• A comparison of delay and power consumption of CMOS and CNTFETs was done using HSPICE  
• Types of faults that occur in CNTFETs due to process variations were studied.  
• Probabilistic model of impact of these faults on performance parameters like current tuning ratio, gate delay and noise margin is developed to decide reliability of CNTFET circuits in case of faults.

#### Maze Routing Algorithm for CAD (Using C/C++)

April 2014

• Developed routing algorithm for given grid and nets considering constraints on total wire length and edge overflow.  
• Used different graph algorithms like Dijkstra's algorithm, Lee’s algorithm, A\* search, etc.

#### Implementation of A\* Router (ECE 556)

January 2014 – May 2014

Implemented an A\* router with Rip-up and Re-Route, Net Ordering and Pin Decomposition and optimizations for increased speed and reduced congestion.  
Tools: C++

3 team members

Adithya

#### Low Noise Amplifier Design

Design of an LNA for Bluetooth radios operating in the ISM band (2.45 GHz) in TSMC CMOS 65 nm.

#### Integrated Circuit design for 8-bit ALU

Custom chip layout, DRC, extraction, functional verification and signal delay measurement using Cadence Virtuoso, HSPICE and CosmosScope

#### Digital Design Flow

Methodology for front-end design to back-end implementation of an up-down counter at System on Chip (SoC) level using different tools from Synopsys and Cadence

#### Interactive Vision Assistant

Development of an embedded system that assists the blind in outdoor navigation and provides intelligent, proactive environment feedback using the Terasic DE2i-150 FPGA board (Cornell Cup USA)

#### Design and simulation of analog circuits

Differential amplifiers, Feedback design and Op-amp frequency compensation using Cadence Spectre

#### Computer gesture control using MEMS inertial sensors

Undergraduate capstone project, Sponsor : Bigtec Pvt. Ltd.

#### Advanced driver assistance systems using MEMS IMU

Texas Instruments Analog Design Contest 2010, Tom Engibous prize 1st runners up

#### Vision based intelligent building automation

National Instruments VI Mantra 2010, Finalist

#### Embedded projects

Low Cost Electronic Tap (NXP LPC1100 Design Challenge, among top 10 finalists worldwide); Hill start assist (JK Tyre BAJA SAE India 2010 competition, 2nd prize for Low cost Car design); Vision based path planning robot (2nd Prize, Snake and Ladder robot event, Anokha 2011); Line maze solving robot (Finalist, Robotics competitions at IIT Madras and Sastra University); Remote controlled ATV (Finalist, Robotics competition at IIT Bombay); Rural LED lamp (Among top 8 finalists in Engenious event, IIT-Madras techfest, Shaastra 2008); Mini-FM station (Milestones, School science fair)